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BAPID  STROGTOBE  SEMTHES  VIA  PBaflJTEB  CHEMICAL  UNE-NOTATIOBS  (TTT)j 
A  COMFUm-PBCaiUCED  IHDEX 

by 

Charles  E.  Qranlto^,  John  E.  Sehults^, 

Oerald  V.  Oibson,  Alan  Qelberg^, 

B.  J.  WlUians^^  and  E.  A.  Metcalf 

IMTBQCtICTICM 

The  previous  papers  in  this  series^ '^hava  discussed  tiie  concept  of  an 
index  of  perauted  Visvesser  cheoical  llne>-notatlonSj  the  significance  of  a 
QUICE-SCAB  area,  and  siaple  nethods  for  preparing  this  type  of  Index  for  a 
small  index  file  of  coiqpoands  (up  to  ca.  5000).  It  has  been  pointed  out  that 
the  preparation  of  an  index  for  a  large  number  of  coaqpounde  would  re<)ulre  the 
use  of  a  coaqiuter.  This  is  the  subject  of  thle  pqser. 

The  project  was  started  in  1963.  At  that  time,  a  Univae  Pile  Coaqfmter 
(Model  n)  was  readily^  available  and,  therefore,  used  for  preparing  the  index. 
After  the  program  was  written  and  eatlefactorily  tested  on  a  trial  deck  of 
1000  cards,  approximately  55>000  Vlswesser  chemical  line-notations,  on  file 
In  this  office,  were  Indexed.  A  program  which  achieves  this  same  result  has 
been  written  for  an  191  IhOl  at  the  T.  B.  Evane  Research  Center  of  the  Diamond 
Alkali  Ccmpany.  Both  prograais  are  available  for  potential  users, 

a  -  To  idiom  «n  inquiries  should  be  addressed, 
b  -  Present  Address:  Weatmlnater  College,  Pulton,  Missouri., 
c  -  Present  Address:  Diamond  Alkali  Coapany,  Painesville,  Ohio, 
d  -  Data  Processing  Dlvielon,  Management  Science  and  Data  Systems 
Office,  Ugewood  Arsenal,  Maryland 


The  discussion  of  a  general  program  to  acccwplish  this  perantation 
Hill  be  divided  into  four  categories: 

1.  Coaster  preparation  of  the  index. 

2.  Cost  of  preparing  an  index. 

3.  The  index. 

U.  Uses  of  the  index. 

1.  (XMPqiBB  PHEPABtTlCW  OF  THE  ItPBI. 

The  input  is  a  single  punch-card  per  coqxnmd^  containing  an  accession 
number,  a  tvo  column  screen,  and  a  Wlswesser  chesiical  line-notation.  The 
progrem  Is  designed  to  effect  the  pensatation  of  the  line-notation  as  each 
card  Is  read  onto  magnetic  tape,  i.e.,  the  operatlone  required  to  select 
pertinent  symbole,  generate  the  scan  area,  and  peniute  the  notation  are  ac- 
coaqplished  and  the  results  stored  prior  to  acc^tance  of  the  next  line-nota¬ 
tion.  The  path  followed  b^  a  typical  line  notation  card  in  these  operations 
will  be  discussed  rather  than  giving  the  step-i^-step  details  of  the  less  under¬ 
standable  directions  and  flow  charts  of  the  prograHer. 

For  the  purpose  of  this  discussion,  it  is  convenient  to  think  of  the  in¬ 
put  information  as  occupying  one  row  of  a  coaqMirtaent  in  the  core  mesrory  (Mas- 
017  1,  figure  1).  For  easy  vieoallsatlon,  the  Memory  row  wbich  can  hold  120 
characters,  is  further  divided  into  four  areas:  A,  B,  C,  and  T)  (figure  2). 

Each  area  is  separated  by  a  blank  space  to  make  the  final  print-cut  more  read¬ 
able,  and  the  following  arbitrary  assignments  are  made: 

a.  Area  A  (8  epaces)  for  the  accession  nmber. 

b.  Area  B  (2  spaces)  for  a  prefix  to  serve  as  a 
screen  in  the  index. 
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c.  Atm  C  (U  0pacas)  for  the  QOICK-SCAH  STiiboIs  to  be 
genermted  by  the  carter. 

d.  Area  B  <96  specee)  for  the  lliie>&otatim. 

Areee  A,  Bj  end  D  cont&is  infonution  reed  directly  froa  the  card.  At 
the  stertj  exee  C  is  eiq>tyj  it  viU  be  filled  by  syebolsj  selected  by  the  eoa- 
puter  in  its  operations  cxl  a  llne-notatlon>  for  ahieh  aa  entiy  alU  be  Bade  la 
the  The  line-DOtatloD  viU  be  found  In  the  lest  half  of  area  B  (spaces 

73  through  120))  initially,  the  first  half  of  area  0  (spaces  2$  through  73)  is 
ei^pty.  Space  73,  the  e«iter  of  area  D,  correivonds  to  the  index  eolnm  of  the 
listing. 

After  all  of  the  infoxBStton  on  one  card  has  be«>  fed  into  the  input  sec- 
ticm  of  Keaory  I,  tbs  ccevuter  is  ready  to  start  generating  the  Infoiaatlon  for 
ths  ODICX-SCAH  area  and  the  penmtations  of  the  line-notation.  The  notation  is 
transferred  to  a  reserve  nnory,  Haory  H  (see  figure  1).  In  this  trsasfer 
and  all  snbseqaent  transfers  of  the  notation,  tbs  infotnation  in  Areaa  A,  B, 
and  C  is  associated  with  the  notation,  but  is  not  cqwrated  upon.  Meaoiy  II 
bolds  the  notatloB  for  a  series  of  actions: 

a.  The  notation  is  duplicated  into  Nanoiy  HI,  l.e. , 
the  infoiMtlon  in  spaces  73-120  of  area  D  is  dnpli- 
catad. 

b.  In  lIsMory  XU,  the  syidwl  in  space  73,  the  Indexing  ayadwl, 
is  colored  to  an  exclusion  Ust^.  (A  list  of  syabola  used 
in  'ttw  Vlsmsser  chanlcal  llns-notations  that  would  not  ba 
useful  aa  Inrtarlng  tenss.)  If  the  sjutbol  (epace  73)  is  not 
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found  on  the  listj  the  entire  block  of  information  is 
considered  to  be  an  index  entry  and  is  returned  to  Mem¬ 
ory  I  for  storage.  Simultaneously,  the  index  symbol 
(space  73)  is  duplicated  and  sent  to  Memory  IV  vhere  it 
is  held  until  all  entries  for  this  notation  have  been 
prepared.  The  iixitlal  symbol  of  a  notation  is  always  an 
index  entry. 

c.  As  the  first  block  is  accepted  by  Memory  I  a  signal  is 
sent  to  Memory  II  and  again  the  notation  is  transferred 
to  Memory  HI.  However,  this  time  the  transfer  involves 

a  position  change  (one  space  left),  so  that  the  line-nota¬ 
tion  now  occupies  the  72-119  block,  and  the  new  symt'/l  in 
space  73  is  compared  to  the  exclusion  list.  If  the  symbol 
is  acceptable,  the  course  of  action  is  the  same  as  for  the 
first  block  of  information.  If  the  symbol  is  not  a  desir¬ 
able  indexing  term,  the  information  in  Memory  III  is  erased 
and  Memory  II  is  alerted  by  signal  to  continue  the  process. 

d.  As  soon  as  Memory  HI  con^ares  two  successive  spaces  con¬ 
taining  no  symbols  (indicates  end  of  line-notation)  dr  after 
120  (last  position)  this  part  of  the  operation  ceases. 

After  each  symbol  in  a  line-notation  has  been  considered  and  the  appro¬ 
priate  operations  have  taken  place  on  each,  the  symbols  stored  in  Memory  IV 
are  returned  to  I  and  the  SCAN  SIMBOLS  are  entered  in  area  C  for  each  entry 
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that  has  beon  returned  fiom  the  operations  in  Memory  III.  Prior  to  the 
acceptance  of  the  second  card,  all  information  in  Memory  I  is  transferred 
to  magnetic  tape  for  storage  and  future  sorting.  The  magnetic  tiipe  con¬ 
tains  all  the  information  generated  from  each  line -notation;  if  prinied- 
out  at  this  stage,  a  listing  as  shown  in  figijre  3,  would  be  obtained.  The 
entire  operation  from  card  to  storage  of  the  permuted  line-notations  on  tape 
takes  ]ess  than  a  second  per  card.  The  methodology  described  can  be  applied 
to  line-notations  containing  a  maximum  of  h8  columns*  on  a  punch-card.  In 
our  files,  approximately  of  the  compounds  require  twenty  columns  or  less. 

In  some  cases,  it  is  desirable  to  include  in  the  notation  field  certain  non- 
notational  information,  e.g.,  9$%  pure,  dimer,  etc.  To  accomplish  this  the 
notation  proper  is  followed  by  two  spaces  and  the  rem’iining  area  (up  to  space 
120)  is  used  for  any  desired  information.  Information  preceded  by  two  spaces 
will  be  printed  out  with  the  various  permutations. 

In  order  to  create  an  index,  all  of  the  taped  permutations  must  now  be 
sorted  alpha-numerically.  TMs  can  be  accomplished  with  an  off-line  tape 
sorter  which  requires  no  computer  time.  The  tape  of  sorted  notations  is  used 
to  print  out  the  index.  The  index  column  may  be  Indicated  either  with  a  line 
(specially  printed  paper)  or  with  a  printer-produced  mark  referencing  the 
index  column  at  the  top  and  bottom  of  each  page.  The  example  of  a  print-out 
(figure  U)  shows  a  page  of  permuted  notations  for  some  of  tne  sulfur  compounds 
appearing  in  the  Pesticide  Index  (2nd  Edition).  The  address  gives  the  page 
where  the  drawn  structure  and  data  may  be  found. 

*This  is  an  arbitrary  assignment,  any  length  may  be  chosen.  For  the  Honeywell 
iiOO  we  extended  it  to  60  columns  and  found  that  this  covered  greater  than  99,9^ 
of  our  file. 
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.?.  COST  OF  PREPARING  /.N  INDEX. 


The  Univac  File  Computer  (Model  IT)  used  for  preparing  the  first  large 
index  is  a  rel  actively  slow  machine  with  a  low  tape  density  (92  characters  per 
inch  as  compared  to  536  for  the  Honeywell  i|CO).  It  is,  therefore,  not  compar¬ 
able  to  most  of  the  computers  available  in  industry  today  which  can  perfom 
this  operation  more  economically.  In  order  to  make  the  dollar  values  more 
meaningful,  the  figures  given  will  be  based  upon  our  experience  with  an  lEM 
IhOl  at  the  T.  R.  Evans  Research  Center  and  the  Honeywell  UOO  vhich  has  re¬ 
placed  the  Univac  at  the  Data  Processing  Center  of  Edgewood. 

The  IBM  lUOl  was  uted  for  permuting  5991  line -notations.  The  input  time 
was  approximately  l\i  nours  (80  cards  per  minute).  For  the  5991  compounds,  a 
total  of  33,080  entries  were  generated  (5.5  entries/compound).  These  entries 
were  stored  on  one-third  of  a  reel  of  magnetic  tape.  In  other  words,  one  stand¬ 
ard  reel,  (2li00  ft.)  could  hold  all  of  the  entries  for  the  indexing  of  about 
18,000  compounds,  or  108,000  entries  (assuming  6  entries/compound).  The  alpha¬ 
numeric  sorting  time  required  for  the  33,08O  entries  was  30  minutes.  The  print¬ 
ing  time  (650  lines/minute)  was  55  minutes.  Therefore,  the  total  machine  time 
(input  to  hard  copy)  was  Just  Tinder  3  hours.  At  a  rental  of  $52/hr.  and  allow¬ 
ing  $15  for  paper,  the  cost  of  preparing  an  index  of  the  permuted  line-notations 
of  about  6,000  compounds,  excluding  labor  and  programming  costs,  would  be  ap¬ 
proximately  $165,  i.e.,  2.75  cents/compound. 

Experience  gained  with  the  Honeywell  UOO  computer  at  Edgewood  Ar-senal 
leads  us  to  predict  that  the  350,000  entries  generated  from  55,000  compounds 
(6.3  entries/compound)  could  be  stored  on  3  to  U  tapes  whereas  21  tapes  were 
required  for  the  Univac  II.  Input  requires  about  three  hours,  sorting  time 
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li  hours,  and  printing  ti^.:o  (900  lines/minute)  5  to  6  hours.  Once  again 
using  a  reasonable  rental  of  $52/hr.,  the  5^,000  compounds  (3^0,000  entries) 
could  ho  indexed  for  approximately  $8^0,  including  $200  for  paper.  This 
averages  out  to  about  1^  cents  per  compound. 

In  addition  to  actual  operation  costs,  the  cost  of  the  program  for 
the  computer  must  be  considered.  This  is  a  one  time  Job  for  a  given  com¬ 
puter.  The  prograi:'  written  for  the  Univac  took  100  hours  (including  de¬ 
bugging  time).  Since  programming  usually  costs  about  $10/'hour,  $1000  would 
be  a  good  estimate  of  the  cost  for  writing  the  first  program.  However,  the 
program  subsequently  written  for  the  lliOl  required  only  2U  hrs.  or  $2U0. 
Since  both  programs  are  available  on  request,  the  cost  for  a  program  on  a 
different  model  should  be  comparable  to  that  required  for  the  lliOl.  Ih  other 
words,  for  about  $1100  one  could  gonerate  an  index  of  pemuted  line-notations 
for  ^5j000  compounds. 

Generation  of  55>000  line -notations  from  chemical  stnictures  would  cost 
about  $10,000  (or  18  cents/notation).  The  rates  used  to  arrive  at  this  fig¬ 
ure  are  summarized  in  figure  It  should  be  remembered  that  notatito  '>  vsTce 
prepared,  will  serve  as  input  not  only  for  the  first  index  but  subsequent  ones 
as  well. 

3.  THE  INDEX 

For  ^5,000  line-notations,  an  index  of  7173  pages  was  obta.lned.  There 
was  a  maximum  of  U9  entries  per  page,  i.e.,  over  3^0,000  entries  were  gener¬ 
ated.  These  pages  were  divided  into  2h  volumes.  For  ease  of  handling,  each 
volume  was  cut  to  11  x  Hi  inches  and  bound  with  hard  covers.  The  back  of 


9 


each  volume  was  labeled  in  the  same  manner  as  an  encyclopedia.  After  bind- 
inc,  tJio  hooks  occupied  I4O  inches  of  shelf  space. 

ij.  USE  OF  THE  INDEX 

In  order  to  test  the  usefulness  of  the  index  some  25  structure  seaiohes 
were  carried  out.  Each  of  these  searches  had  been  made  previously  using  molec¬ 
ular  formulae,  a  fie^nentation  code^,  and  tabulated  listings  of  line -notations 
in  alpha-numeric  order.  In  each  case ,  the  index  was  found  to  be  significantly 
faster  and  a  greater  number  of  compounds  meeting  the  search  criteria  were  found. 
Some  searches  which  had  required  a  full  day  were  completed  in  a  matter  of  min¬ 
utes  using  the  index. 

The  index  is  used  in  the  same  manner  as  a  dictionary.  Both  specific  and 
general  searches  including  analogs  and  homologs  may  be  run  at  one's  desk. 

There  is  no  further  need  to  use  any  mechanical  equipment  in  locating  desired 
structures. 

a.  Specific  Look-up 

For  a  specific  structure,  one  prepares  the  line-notation  and  looks  it  up 
in  the  index.  This  process  has  proven  to  be  faster  than  writing  out  a  molec¬ 
ular  fomula  and  locating  it  in  a  molecular  fonrrala  file.  Each  notation  re¬ 
presents  but  one  compound  whereas  each  foxumla  may  represent  many  compounds. 
Since  the  notation  is  unique  and  unambiguous,  it  can  be  found  in  a  specific 
location  in  the  index.  An  average  structure  requires  about  15  seconds  to  en¬ 
code  and  a  notation  can  be  found  in  the  index  as  fast  as  a  word  can  be  located 
in  a  dictionary. 
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b.  General  Searcher 


The  procedure  for  running  general  searches  is  nearly  as  simple  as  the 
specific  lock-up.  One  decides  which  symbols  meet  the  requirements  of  the 
request  and  either  locates  these  in  the  index  or  assigns  a  clerical  wrfcer 
to  the  job.  The  frequency  of  occurrence  for  Index  symbols  would  determine 
the  starting  place  in  the  index.  Table  1  presents  the  frequency  of  Index 
symbols  for  our  first  index.  This  table  could  be  extended  to  Include  a  more 
detailed  break-down  (beyond  the  first  index  symbol),  if  desired.  It  is  not 
necessary  to  know  what  the  line-notations  represent  to  find  entries  in  the 
index,  just  as  it  is  not  necessary  to  know  the  meaning  of  a  word  to  find  it 
in  a  dictionary. 

With  an  index  of  permuted  line-notations  the  individual  in  charge  of 
chemical  structure  retrieval  could  rapidly  answer  such  questions  as: 

(1)  How  many  quinoline  derivatives  are  on  file? 

(2)  Which  quinolines  contain  a  nitro  group  as  well 
as  a  hydroxyl  group? 

(3)  How  many  aliphatic  alkynes  have  been  investigated 
and  which  ones  contain  chlorine? 

All  of  these  questions  could  be  answered  routinely.  They  would  be 
handled  as  follows: 

(1)  Open  the  T  volume  of  the  index  to  the  T66  FNJ 
section  (a  matter  of  seconds). 

(2)  Visually  check  the  QUICK-SCAN  area  of  the  quino¬ 
line  section,  checking  off  those  for  which  a  NW 
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FREQUENCTf^  OF  G  .'CURRMOE  FOR  INEEX  SYMBOLS 
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5 
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9 

31 

96 

R 

ii 

31x60U 

A 

23 

1731 

S 

9 

195U7 

B 

2h 

1361 

T 

6 

2liU01 

C 

16 

U565 

U 

10 

17003 

V 

3 

35617 

E 

19 

362k 

W 

12 

89U1 

F 

15 

Bk30 

G 

7 

2352U 

H 

T?02 _ 

Z 

11 

*Thcse  frequencies  are  for  Indexed  symbols,  and  do  not  include  those  excluded 
by  the  program  (e.g.,  tlxe  T  count  is  for  T's  ^ich  initiate  ring  systemr,  not 
T's  indicating  ring  saturation). 
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and  Q  appear  (for  the  nitro-  and  hydroxyl-gi^oups^ 
respectively). 

(3)  Open  the  index  to  the  UU  section,  count  the  com¬ 
pounds  not  containing  an  L  (carbocyclic),  T  (heterc- 
cyclic),  or  R  (benzenoid)  in  the  QUICK -SCAN  area. 

Check  off  those  vdiich  do  have  a  0  in  the  QUICK-SCAN 
area  and  list  tliem. 

A  clerk  with  nc  chemical  background  could  find  the  answers  to  each  of 
the  questions  given  above  if  provided  with  a  list  of  appropriate  search  synbols. 
If  a  large  niimber  of  searches  were  to  be  run  every  day  this  would  be  the  most 
economical  approach.  The  frequency  of  searches  may  veil  detenilne  the  need  to 
perfozm  computer  searches.  Since  the  pemuted  notations  am  already  on  magne¬ 
tic  tape,  a  search  program  can  be  written  and  performed. 

When  a  large  number  of  compounds  are  found  that  meet  the  search  require¬ 
ments,  it  has  been  found  useful  to  make  a  Xerox  mproduction  of  the  index 
pages  Instead  of  writing  down  the  nxsnbers  for  each  compound.  This  avoids 
transcription  eiurors  which  mount  up  quickly  for  long  lists  of  numbers. 

It  is  realized  that  the  forementloned  questions  are  not  complex.  How¬ 
ever,  these  are  the  types  of  questions  most  frequently  asked.  With  the  in¬ 
dex,  such  questions  take  only  minutes  to  answer.  Any  increase  in  the  numbers 
of  parameters  or  specificity  shortens  the  search  time  required  ly  reducing 
the  search  to  a  smaller  section  of  the  index. 

c.  Limitations 

The  only  questions  the  index  is  not  designed  to  handle  efficiently  are 
those  involving  the  relative  positions  of  every  atom  within  a  molecule,  e.g. , 

13 


wliicli  compounds  in  the  file  contain  a  nitrogen  atom  three  atoms  from  any 
oxygen  atom  and  two  carbons  removed  from  a  sulfur  atom?  In  Wiswesser  line- 
notations,  symbols  generally  represent  groups  of  atoms  rather  than  an  ind5.- 
vldual  atom.  This  prevents  the  use  of  the  index  for  such  searches.  However, 
it  is  not  a  limitation  of  the  notation,  for  a  computer  could  be  px^grammed 
to  use  the  notation  ^or  such  searches^. 

d.  Vfho  Needs  to  Know  the  Notation 

Only  one  or  two  chemists  within  an  organization  need  to  know  the  Wiswesser 
line-notation  to  make  the  index  a  useful  means  of  retrieving  chemical,  struc¬ 
tures.  The  chemist  or  adminlsti‘ator  can  request  information  by  stTuctxire(s) 
and  receive  answers  in  the  same  form.  After  compound  numbers  are  found  in  the 
index,  structure  cards  can  be  pulled,  reproduced,  and  sent  to  the  requester. 

The  notation  serves  only  as  a  means  by  which  compounds  are  located. 

e.  Potential 

As  demonstrated,  the  index  is  a  very  powerful  tool  for  keeping  track  of 
a  file  of  chemical  compounds.  However,  it  has  an  even  greater  potential.  For 
example,  it  could  expedite  procurement  procedures.  An  organization  which  routine 
ly  purchases  large  numbers  of  materials  from  commercial  suppliers  could  encode 
the  compounds  listed  in  available  catalogues  and  include  the  source  and  price 
of  each  item.  Preparation  of  an  index  would  then  permit  rapid  determination 
of  conpound  availabil*  ly,  the  companies  offering  an  item,  and  their  listed 
prices.  All  of  this  information  would  be  found  in  the  same  section  of  the  Index, 
under  the  specific  notatj.on. 
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A  second  possibility  -s  the  Generation  of  a  functional  group  index 
for  all  clieriical  structures  appearing  in  a  catalog,  textbook,  journal,  or 
secondary  source  publication. 

Another  possibility  that  is  now  being  explored  is  the  incorporation  of 
biological  data  into  the  index.  The  jjiclusion  of  such  data  would  permj.t  in¬ 
vestigation  of  structure-activity  relationships.  A  glance  at  a  given  section 
would  reveal  how  nar.y  compounds  contained  a  given  ring  structure(s) ,  functional 
■3roup(s),  or  combinations  thereof j  as  well  as  the  type  and  level  of  activity 
exhibited.  This  could  bo  a  powerful  aid  to  a  Director  of  Research  or  his  assist¬ 
ants.  Any  type  of  data  could  be  included;  the  possibilities  are  unlimited. 

The  data  can  be  ordered  by  any  desirable  parameter  with  the  linearized  struc¬ 
tures  associated  with  it  for  comparison, 
f.  Updating  the  Index 

Updating  the  index  does  not  present  a  problem.  Since  the  program  is 
available,  supplements,  which  will  include  con^jounds  received  after  the  major 
index  vras  generated,  can  be  prepared  at  suitable  intervals.  When  the  supple¬ 
ments  become  too  numerous  for  easy  searching,  the  tapes  used  for  each  index 
can  be  blended  and  used  to  create  a  nea^  master  index.  The  frequency  of  up¬ 
dating  would  depend  upon  the  growth  rate  of  the  file. 

SUMMARY 

The  preparation  of  a  computer-produced  index  of  permutod  Wiswesser 
chemical  line -notations  is  described.  The  uses  and  7dmitations  of  this 
powerful  and  economical  retrieval  tool  are  discussed.  The  utility  of  s-ch 
an  index  way  be  markedly  increased  by  the  inclusion  of  biological,  source, 
cost,  etc. ,  data. 
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INFORHATICttJ  PATH  FECM  CARD  TO  TAPE 


FIGURE  2 

REPRESENTATIDR  OF  OKE  SECTION  OF  INPUT-OUTPUT  KEMORY  IN  ca-iPUTER 

MEMORY  I 
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FiGim  3 


EXAMPLE  OF  MACMETIC  TAPE  PRIMT-ODT 
PRIOR  TO  SbRTINQ 


00001008  Cl  TOQ 
00001008  Cl  TNQ 
00001008  Cl  TNQ 
00021101  6  QVQ 
000210.01  6  QVO 
0002U101  6  QVG 


It:DEX  COLTMN 

- ^ - 

T6KJ  BQ 
T6NJ  BQ 
T6NJ  BQ 

QV2G 

QV2G 

QV2G 
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FIGURE  h 


EXAMPLE  OF  PRINT-OUT 


PAGE  OF  PERMUTED  NOTATIONS  FOR  SCME  SULFUR  COMPOUNDS 
APPEARING  IN  PESTICIDE  INDEX  ( 2ND  EDITION) 


f.^ 


£ 


& 

w 

pj 


6 


o 


fX, 


3  w 

H  H 


11 
w  l» 

Ph  CO 


P12  130A  3b  HGNRSWR 
P12  1528  37  GSWRG 
P12  211A  37  GRSWRG 
P12  112C  38  ZSl^HOPOO 
P12  098E  38  ZSWROPOO 
P12  112D  38  ZSWROPSO 
P12  130F  38  MSWROPSO 
P12  lk9G  38  YMSWRDPSO 
P12  098F  38  ZSHHDPSO 
P12  052C  37  ZSWRG 
P12  112S  85  TNNRSWZNO 
P12  09)iA  8  OPSSXVM 
P12  OU7B  C7  TVNVUSXGGO 
P12  135a  G7  TVNVSXGGQ 
P12  210D  G7  TVNVUSXGGYGG 
P12  139A  X5  T0SSXVN\NUUQQ 
P12  093E  8  OPSSSr 
P12  2228  8  OPOSSTVM 
P12  0908  38  OPSSTRCN 
P12  19UC  P3  TNSYSNUS 
P12  213C  C3  INSISNUS 
P12  0378  3  NIUSSIUM 
P12  128c  3  NTUSSYUai 
P12  095b  38  OPSORS 
P12  130D  38  OPSORS 
P12  130E  38  OPSMCRS 
P12  167C  38  SPORS 
P12  lOOA  C3  TNNNSMT 
P12  2008  C3  TNNNSM 
P12  1678  C3  TNNNSMYM 
P12  02U  C3  TNNNSMIM 
P12  095C  38  OPSORS _ 


NOTATION 


Index  Column 


^f•Refe^3  to  page  number  appearing  in  Pesticide  Index 
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FIGURE  S 

TIME  AND  GOST  ESTIMATES  FOR  AN  INDEX  OF  PERMUTED 
LINE-NOTATIONS  FOR  5!?  .OOP  C6MP@n5S 


INPUT  (Line -notations) 

TIME  (days)  COST  ($) 

WRITING  NOTATIONS  110  (500/day)  UlOO  ($hO/day) 

PROOFING  NOTATIONS  110  UuOO 

PREPARING  PUNCai-CARDS  27.5  (2000/day)  llbO  ($l6/day) 

VERIFYING  PUNCH-CARDS  27.5  hhO 

TOTAL  $9680 

COST  OF  55,000  CARDS  at  $0^O0125/CARD  69 

COST  FOR  MACHINE  RENTAL  (KEY-PUNCH  &  VERIFIER)  120 

$9869 

INDEX 

PROGRAM  COST 

MACHINE  RENTAL  (13  Hrs,  $52/Hr.)  676 

PAPER  200 

TOTAL  $1116 

TOTAL  COST  FOR  INDEHNG  55*000  STRUCTURES  $12,OB5 
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13  ABSTRACT 

This  is  the  third  paper  in  a  series  concerned  with  the  use  of  the  Wiswesser 
line-notation  as  the  basis  of  a  storage  and  retrieval  system  for  organic  struc¬ 
tures.  This  line-notation  is  an  unique  and  unambiguous  method  of  representing 
chemical  structures  by  a  linear  series  of  letters,  numbers,  ampersands,  and  hy¬ 
phens.  The  uniqueness  of  the  notation  permits  the  use  of  an  alpha-numerlcally 
arranged  index  of  the  permutations  of  the  notation  for  structure  searching.  The 
present  paper  discusses  the  preparation  of  a  cotfq}uter-prepared  index  of  pexmuted 
Wiswesser  line -notations  from  the  standpoint  of  the  cost,  time,  and  logic  of  the 
computer  program.  The  entire  cost  of  preparing  such  an  index  of  50,000  struc¬ 
tures,  i.e.,  writing  and  proofing  the  notations,  punching  and  verifying  the  punch- 
cards,  caixl  and  paper  costs,  rental,  and  program  preparation,  is  somewhat  less 
tlian  $12,000.  The  uses,  limitations,  and  methods  of  further  increasing  the  util¬ 
ity  of  this  powerful  and  economical  retrieval  tool  by  the  inclusion  of  biological, 
"mree,  cost,  and  other  types  of  data  are  also  presented. 
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